.. _`K-means Clustering`:

.. _`org.sysess.sympathy.machinelearning.k_means`:

K-means Clustering
``````````````````

.. image:: dataset_blobs.svg
   :width: 48


Clusters data by trying to separate samples in n groups of equal variance


Documentation
:::::::::::::

Attributes
==========

    **cluster_centers_**
        Coordinates of cluster centers. If the algorithm stops before fully
        converging (see ``tol`` and ``max_iter``), these will not be
        consistent with ``labels_``.


    **inertia_**
        Sum of squared distances of samples to their closest cluster center,
        weighted by the sample weights if provided.


    **labels_**
        Labels of each point


Definition
::::::::::


Output ports
============

    **model**  model
        Model


Configuration
=============

    **K-means algorithm** (algorithm)
        K-means algorithm to use. The classical EM-style algorithm is `"lloyd"`.
        The `"elkan"` variation can be more efficient on some datasets with
        well-defined clusters, by using the triangle inequality. However it's
        more memory intensive due to the allocation of an extra array of shape
        `(n_samples, n_clusters)`.

        .. versionchanged:: 0.18
            Added Elkan algorithm

        .. versionchanged:: 1.1
            Renamed "full" to "lloyd", and deprecated "auto" and "full".
            Changed "auto" to use "lloyd" instead of "elkan".
    **Initialization method** (init)
        Method for initialization:

        * 'k-means++' : selects initial cluster centroids using sampling             based on an empirical probability distribution of the points'             contribution to the overall inertia. This technique speeds up             convergence. The algorithm implemented is "greedy k-means++". It             differs from the vanilla k-means++ by making several trials at             each sampling step and choosing the best centroid among them.

        * 'random': choose `n_clusters` observations (rows) at random from         data for the initial centroids.

        * If an array is passed, it should be of shape (n_clusters, n_features)        and gives the initial centers.

        * If a callable is passed, it should take arguments X, n_clusters and a        random state and return an initialization.

        For an example of how to use the different `init` strategy, see the example
        entitled sphx_glr_auto_examples_cluster_plot_kmeans_digits.py.
    **Maximum number of iterations** (max_iter)
        Maximum number of iterations of the k-means algorithm for a
        single run.
    **Number of clusters/centroids** (n_clusters)
        The number of clusters to form as well as the number of
        centroids to generate.

        For an example of how to choose an optimal value for `n_clusters` refer to
        sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py.
    **Number of runs** (n_init)
        Number of times the k-means algorithm is run with different centroid
        seeds. The final results is the best output of `n_init` consecutive runs
        in terms of inertia. Several runs are recommended for sparse
        high-dimensional problems (see kmeans_sparse_high_dim).

        When `n_init='auto'`, the number of runs depends on the value of init:
        10 if using `init='random'` or `init` is a callable;
        1 if using `init='k-means++'` or `init` is an array-like.

        .. versionadded:: 1.2
           Added 'auto' option for `n_init`.

        .. versionchanged:: 1.4
           Default value for `n_init` changed to `'auto'`.
    **Random seed** (random_state)
        Determines random number generation for centroid initialization. Use
        an int to make the randomness deterministic.
        See random_state.
    **Tolerance** (tol)
        Relative tolerance with regards to Frobenius norm of the difference
        in the cluster centers of two consecutive iterations to declare
        convergence.


Implementation
==============

.. automodule:: node_clustering
    :noindex:

.. class:: KMeansClustering
    :noindex: